Phonemic Coding Might Be a Result of Non-Functional Sensory-Motor Coupling Dynamics

نویسنده

Pierre-yves Oudeyer

چکیده

Human sound systems are invariably phonemically coded. Furthermore, phoneme inventories follow very particular tendancies. To explain these phenomena, there existed so far three kinds of approaches : \Chomskyan"/cognitive innatism, morpho-perceptual innatism and the more recent approach of \language as a complex cultural system which adapts under the pressure of eÆcient communication". The two rst approaches are clearly not satisfying, while the third, even if much more convincing, makes a lot of speculative assumptions and did not really bring answers to the question of phonemic coding. We propose here a new hypothesis based on a low-level model of sensory-motor interactions. We show that certain very simple and non language-speci c neural devices allow a population of agents to build signalling systems without any functional pressure. Moreover, these systems are phonemically coded. Using a realistic vowel articulatory synthesizer, we show that the inventories of vowels have striking similarities with human vowel systems. 1. The origins of phonemic coding and other related puzzling questions Human sound systems have very particular properties. First of all, they are phonemically coded. This means that syllables, de ned as oscillations of the jaw (MacNeilage, 1998), are composed of re-usable parts. These are called phonemes. Thus, syllables of a language may look rather like la, li, na, ni, bla, bli, etc ... than like la, ze, fri, won, etc .... This might seem unavoidable for us who have a phonetic writing alphabet, but in fact our vocal tract allows to produce syllable systems in which each syllable is holistically coded and has no parts which is also used in another syllable. Yet, as opposed to writing systems for which there exists both \phonetic" coding and holistic/pictographic coding (for e.g. Chinese), all human languages are invariably phonemically coded. Secondly, the set of re-usable parts of syllable systems, as well as the way they are combined, follows precise and surprising tendancies. For example, our vocal tract allows us to produce hundreds of di erent vowels. Yet, each particular vowel system uses most often only 5 or 6 vowels, and extremely rarely more than 12 (Maddieson and Ladefoged, 1996). Moreover, there are vowels that appear in these sets much more often than others. For example, most of languages contain the vowels [a], [i] and [u] (87 percent of languages) while some others are very rare, like [y], [oe] and [ui] (5 percent of languages). Also, there are structural regularities that caracterize these sets : for example, if a language contains a back rounded vowel of a certain height, for example an [o], it will usually also contain the front, unrounded vowel of the same height. The questions are then : Why are there these regularities ? How did they appear ? What are the genetic, glosso-genetic/cultural, and ontogenetic components of this formation process ? Several approaches have already been proposed in the litterature. The rst one, known as the \post-structuralist" Chomskian view, defends the idea that our genome contains some sort of program which is supposed to grow a language speci c neural device (the so-calles Language Acquisition Device) which knows a priori all the algebric structures of language. This concerns all aspects of language, ranging from syntax (Chomsky, 1958; Archangeli and Langendoen, 1997) to phonetics (Chomsky and Halle, 1968). For example this neural device is supposed to know that syllables are composed of phonemes which are made up by the combination of a few binary features like the nasality or the roundedness. Learning a particular language only amounts to the tuning of a few parameters like the on or o state of these features. It is important to note that in this approach, the innate knowledge is completely cognitive, and no reference to morpho-perceptual properties of the human articulatory and perceptual apparatuses appears. This view is becoming more and more incompatible with neuro-biological ndings (which have basically failed to nd a LAD), and genetics/embryology which tend to show that the genome can not contain speci c and detailed information for the growth of so complex neural devices. Finally, even if it revealed to be true, it is not really an answer to the questions we asked earlier : it is only a displacement of the problem. How do the concerned genes get there in the course of evolution ? Why were they selected ? No answer has been proposed by post-structuralist linguistics. Another approach is that of \morpho-perceptual" innatists. They argue (Stevens 1972) that the properties of human articulatory and perceptual systems explain totally the properties of sound systems. More precisely, their theory relies on the fact that the mapping between the articulatory space and the acoustic and then perceptual spaces is highly non-linear : there are a number of \plateaus" separated by sharp boundaries. Each plateau is supposed to naturally de ne a category. Hence in this view, phonemic coding and phoneme inventories are direct consequences of the physical properties of the body. Convincing experiments have been conducted concerning certain stop concsonants (Damper 2000) with physical models of the vocal tract and the cochlea. Yet, there are aws to this view : rst of all, it gives a poor account of the great diversity that caracterize human languages. All humans have approximately the same articulatory/perceptual mapping, and yet di erent language communities use di erent systems of categories. One could imagine that it is because some \plateaus"/natural categories are just left unused in certain languages, but perceptual experiments (Kuhl 2000) have shown that very often there are sharp perceptual non-linearities in some part of the sound space for people speaking language L1, corresponding to boundaries in their category system, which are not perceived at all by people speaking another language L2. This means for instance that japanese speakers cannot hear the di erence between the \l" in \lead" and the \r" in \read". As a consequence, it seems that there are no natural categories, and most probably the results concerning certain stop consonants are anecdotal. Moreover, the physical models of the vocal tract and of our perceptual system that have been developped in the litterature (Boersma 1998) show clearly that there are important parts of the mapping which is not at all looking like plateaus separated by sharp boundaries. Clearly, considering only physical properties of the human vocal tract and cochlea is not suÆcient to explain both phonemic coding and structural regularities of sound systems. A more recent approach proposes that the phenomena we are interested in come from self-organisation processes occuring mainly at the cultural and ontogenetic scale. The basic idea is that sound systems are good solutions to the problem of nding an eÆcient communicative system given articulatory, perceptual and cognitive constraints. And good solutions are caracterized by the regularities that we try to explain. This approach was initially defended by (Lindblom 1992) who showed for example that if one optimizes the energy of vowel systems as de ned by a compromise between articulatory cost and perceptual distinctiveness, one nds systems which follow the structural and frequency regularities of human languages. (Schwartz et al. 1997) reproduced and extended the results to CV syllables regularities. As far as phonemic cogding is concerned, Lindblom made only simple and abstract experiments in which he showed that the optimal systems in terms of compromise between articulatory cost and acoustic distinctiveness are those in which some targets composing syllables are re-used (note that Lindblom presuposes that syllables are sequences of targets, which we will do also in this paper). Yet, these results were obtained with very low-dimensional and discrete spaces, and it remains to be seen if they remain valid when one deals with realistic spaces. Lindblom proposed another possible explanation for phonemic coding, which is the storage cost argument. It states that re-using parts requires less biological material to store the system, and thus is more advantageous. This argument seems weak for two reasons : rst the additional cost of storing un-related parts is not so important, and there are many examples of cultural systems which are extremely memory uneÆcient (for example the pictogram based writing systems) ; secondly, it does suppose that the possibility of re-using is already there, but what \re-using" means and how it is performed by our neural systems is a fundamental question (this is similar to models of the origins of compositionality (Kirby, 1998) which in fact pre-suppose that the ability to compose basic units is already there, and in fact only show in which conditions it is used or not). These experiments were a breakthrough as comparared to innatists theories, but provide unsatisfaying explanations : indeed, they rely on explicit optimization procedures, which never occur as such in nature. There are no little scientists in the head of humans which make calculations to nd out which vowel system is cheaper. Rather, natural processes adapt and self-organise. Thus, one has to nd the processes which formed these sound systems, and can be viewed only a posteriori as optimizations. It has been proposed by (de Boer 2001) that these are imitation behaviors among humans/agents. He built a computational model which consisted of a society of agents playing culturally the so-called \imitation game". Agents were given a physical model of the vocal tract, a model of the cochlea, and a simple prototype based cognitive memory. Their memory of prototypes was initially empty and grew through invention and learning from others, and scores were used to assess them and possibly prune the uneÆcient ones. One round of the game consisted in picking up two agents, the speaker and the hearer. The speaker utters one sound of its repertoire, and the hearer tries to imitate it. Then the speaker evaluates the imitation by checking if he categorizes the imitation as the item he initially uttered. Finally, he gives feedback to the hearer about the result of this evaluation (good or not). de Boer showed that after a while, a society of agents forms a shared vowel system, and that the formed vowel systems follow the structural regularities of human languages. They are somewhat optimal, but this is a side e ect to adaptation for eÆcient communication under the articulatory, perceptual and cognitive pressures and biases. These results were extended by (Oudeyer 2001b) for the case of syllable systems, where phonological rules were shown to emerge within the same process. As far as phonemic coding is concerned, (Oudeyer 2002) has made experiments which tend to indicate that the conclusions drawn from the simple experiments of Lindblom can hardly be extended to realistic settings. It seems that with realistic articulatory and perceptual spaces, non phonemically coded syllable systems that are perfectly suÆcient for eÆcient communication emerge easily. Thus it seems that new hypothesis are needed. This paper will present a model that follows a similar approach, yet with a crucial di erence : no functional pressure will be used here. Another di erence is that the cognitive architecture of the agents that we use is modeled at a lower level, which is the neural level. We will show that phonemic coding and shared vowel systems following the right regularities emerge as a consequence of basic sensory-motor coupling on the one hand, and of unsupervised interactions among agents on the other hand. In particular, we will show that phonemic coding can be explained without any reference to the articulatory/perceptual mapping, and yet how this mapping explains some of the structural regularities. The emergent vowel systems will be shown to have great efciency if they were to be recruited for communication, and yet were not formed under any communicative pressure. This is a possible example of what has been sometimes termed \exaptation". An important aspect to keep in mind is that the neural devices of our agents are very generic and could be used to learn for example hand-eye coordination. Thus they are not at all language speci c and at odds with neural devices like the LAD. 2. A low-level model of agents that interact acoustically The model is a generalization of the one described in (Oudeyer 2001a), which was used to model a particular phenomenon of acoustic illusion, called the perceptual magnet e ect. (Oudeyer 2001a) also described a rst simple experiment which coupled agent and neural maps, but it involved only static sounds/articulations and abstract articulatory models. In particular, the question of phonemic coding was not studied. The present paper extends it to dynamic articulations, hence complex sounds, and will use both abstract and realistic articulatory models. We also describe in details the resulting dynamics by introducing entropy-based measures which allow to follow precisely what happens. The model is based on topological neural maps. This type of neural network has been widely used for many models of cortical maps (Morasso et al., 1998), which are the neural devices that humans have to represent parts of the outside world (acoustic, visual, touch etc...). There are two neuroscienti c ndings on which our model relies, and that were initially made popular with the experiments of Georgopoulos (1988) : on the one hand, for each neuron/receptive eld in the map there exist a stimulus vector to which it responds maximally (and the response decreases when stimuli get further from this vector) ; on the other hand, from the set of activities of all neurons at a given moment one can predict the perceived stimulus or the motor output, by computing what is termed the population vector (see Georgopoulos 1988) : it is the sum of all prefered vectors of the neurons ponderated by their activity (normalized like here since we are interested in both direction and amplitude of the stimulus vector). When there are many neurons and the preferred vectors are uniformly spread across the space, the population vector corresponds accurately to the stimulus that gave rise to the activities of neurons, while when the distribution is inhomogeneous, some imprecisions appear. This imprecision has been the subjects of rich research, and many people proposed more precise variants (see Abbot and Salinas, 1996) to the formula of Georgopoulos because they assumed the sensory system coded exactly stimuli (and hence that the formula of Georgopoulos must be somewhat false). On the contrary we have shown in (Oudeyer 2001a) that this imprecision allows the interpretation of \magnet e ect" like psychological phenomena, i.e. sensory illusions, and so may be a fundamental characteristic of neural maps. Moreover, the neural maps are recurrent, and their relaxation consists in iterating the coding/decoding with the population vector : the imprecision coupled with positive feedback loop forming neuron clusters will provide wellde ne non-trivial attractors which can be interpreted as (phonemic) categories. A neural map consists of a set of neurons ni whose \preferred" stimulus vector is noted vi. The activity of neuron ni when presented stimulus v is computed with a gaussian function : act(ni) = e dist(vi;v) = 2 (1) with sigma being a parameter of the simulation (to which it is very robust). The population vector is then :

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phonemic Coding Might Result From Sensory-Motor Coupling Dynamics

متن کامل

External human exposure and management immune system in pathogenesis of irritable bowel syndrome

External exposed radiation may play an important role in pathogens of irritable bowel syndrome (IBS), although is thought to arise due to a combination of genetic and environmental factors. The result is dysregulated immune responses due to alteration in the gut microbiota population and the subsequent development of gut inflammation. It has recently been shown that the effect of ioni...

متن کامل

Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production

Auditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult's tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech how...

متن کامل

اثر باز آموزی حسی با بهبود عملکرد اندام فوقانی مبتلا در بیماران سکته‌ی مغزی ایسکمیک

Background and Objective: Stroke results in increased sensory disorder, motor impairment and functional deficit. Sensory stimulation is the basis for beginning of the process of brain plasticity and recovery of sensory motor function in the affected limbs. The objective of this study was to investigate the effect of sensory retraining on functional recovery of upper limbs in patients with ische...

متن کامل

The Effect of Combined Core Stability and Sensory-motor Exercises on Pain, Performance and Movement Fear in Retired Male Athletes with Non-specific Chronic Low Back Pain

Aims and background: Low back pain is one of the important causes of early retirement in athletes. The purpose of this research was to study the effect of combined core stability and sensory-motor exercises on pain, performance, and movement fear in retired male athletes with non-specific chronic low back pain. Materials and methods: This study was a semi-experimental, and its design was two-g...

متن کامل

Comprehensive Computational Analysis of Protein Phenotype Changes Due to Plausible Deleterious Variants of Human SPTLC1 Gene

Genetic variations found in the coding and non-coding regions of a gene are known to influence the structure as well as the function of proteins. Serine palmitoyltransferase long chain subunit 1 a member of α-oxoamine synthase family is encoded by SPTLC1 gene which is a subunit of enzyme serine palmitoyltransferase (SPT). Mutations in SPTLC1 have been associated with hereditary sensory and auto...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Phonemic Coding Might Be a Result of Non-Functional Sensory-Motor Coupling Dynamics

نویسنده

چکیده

منابع مشابه

Phonemic Coding Might Result From Sensory-Motor Coupling Dynamics

External human exposure and management immune system in pathogenesis of irritable bowel syndrome

Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production

اثر باز آموزی حسی با بهبود عملکرد اندام فوقانی مبتلا در بیماران سکته‌ی مغزی ایسکمیک

The Effect of Combined Core Stability and Sensory-motor Exercises on Pain, Performance and Movement Fear in Retired Male Athletes with Non-specific Chronic Low Back Pain

Comprehensive Computational Analysis of Protein Phenotype Changes Due to Plausible Deleterious Variants of Human SPTLC1 Gene

عنوان ژورنال:

اشتراک گذاری